# **GLOBAL JOURNAL OF ENGINEERING SCIENCE AND RESEARCHES**

REVIEW OF LOW-LATENCY SUCCESSIVE-CANCELLATION POLAR DECODER ARCHITECTURE

VENGADESH.P\*1, DEIVAKANI.M2

PG Scholar<sup>\*1</sup>,Assistant professor<sup>2</sup>

Department of Electronics and Communication Engineering, PSNA College of Engineering and Technology,

#### Dindigul, Tamilnadu, India \*1,2 ABSTRACT

Polar codes become one in every of the necessary error correction codes attributable to their capacity-achieving property. Successive cancellation (SC) rule is seen as a decent contestant for hardware style of polar decoders attributable to its low quality. During this project, we tend to gift a unique reformulation for the last stage of SC decryption. The proposal offers 2 edges. First, essential path and hardware quality within the last stage of SC rule is considerably reduced. Second, two bits are decoded at the same time rather than one bit. As a result, this new decoder noted as 2b-SC-Precomputation decoder reduces latency while not performance loss. The planned polar decoder design provides vital blessings in terms of turnout and hardware potency.

Key words—Look-ahead, polar codes, precomputation, successive cancellation, 2-bit decoder.

### I. INRODUCTION

Polar codes, because the 1st obvious capacity-achieving codes over binary-input distinct memory less channel (B-DMC) [1], have received vital read among varied forward error correction (FEC) codes. Attributable to their express structure and low-complexity encoding/decoding theme, polar codes became significant codes in committal to writing theory. To date, several efforts have self-addressed many theoretical aspects of polar codes [2]–[9]. However, with the exception of [10]–[14], [19], not several publications have thought of the VLSI style of polar decoders. In [10], associate FPGA implementation of polar decoder supported the belief-propagation (BP) rule was enforced. Though BP decoder has specific benefits in parallel style, attributable to the necessity of enormous variety of process parts (PES), the BP decoder isn't most well-liked for sensible applications. In [11], [12], [19], Successive cancellation (SC) polar decoders were rumored. This low-complexity architectures area unit appropriate for area-stringent applications; but, attributable to the inherent serial nature of the SC rule, these SC decoders fall attributable to long latency and low turnout. In [13], [14], a precipitation theme was additional to the SC rule, that improved in reducing the latency from (2n-2) to (n-1). However, considering the penalty of enlarged hardware, the SC- Precipitation decoder doesn't show vital improvement in terms of hardware potency.

There are a unit 3 varieties of new SC decoder architectures. First, a unique reformulation of the last stage of the initial SC decoder permits 2 bits to be decoded within the same clock cycle, that reduces latency from (2n-2) to (1.5n-2) this design is brought up because the 2b-SC decoder. Second, the employment of overlapped programming technique [15] once more reduces the latency to (n-1) this design is brought up because the 2b-SC-Overlapped-SCheduling decoder. Third, the employment of precipitation [16]–[18] and look-ahead [17], [18] techniques any reduces the latency from (n-1) to (3n/4-1). This design is is aware of to because the 2b-SC-Precomputation decoder. This becomes the smallest amount doable latency is (n-1) therefore, the (3n/4-1) latency of the planned 2b-SC-Precomputation decoder is that the least among all glorious architectures.

#### II. SURVEY OF RELATED WORK

Amin Alamdar-Yazdi et, al a modification is introduced of the successive cancellation decoder for polar codes, in which local decoders for rate-one constituent codes are simplified. This modification reduces the decoding latency and algorithmic complexity of the conventional decoder, while preserving the bit and block error rate. Significant latency and complexity reductions are achieved over a wide range of code rates.

Camille Leroux, Alexandre J. Raymond et, al propose a semi-parallel architecture for the implementation of successive cancellation decoding. It takes advantage of the recursive structure of polar codes to make efficient use of processing resources. The derived architecture has a very low processing complexity while the memory complexity remains similar to that of previous architectures. This drastic reduction in processing complexity allows very large polar code decoders to be implemented in hardware. A polar code successive cancellation decoder is implemented in an FPGA.

Bin Li, Hui Shen, David Tse et, al propose an adaptive SC (Successive Cancellation)-List decoder for polar codes with CRC. This adaptive SC-List decoder iteratively increases the list size until at least one survival path can pass CRC. Simulation shows that the adaptive SC-List decoder provides significant complexity reduction. It also demonstrate that polar code (2048, 1024) with 24-bit CRC decoded by proposed adaptive SC-List decoder with very large maximum list size can achieve a frame error rate  $FER \le 10 - 3$  at Eb/No = 1.1dB, which is about 0.25dB from the information theoretic limit at this block length.

Gabi Sarkis and Warren J. Gross et, al an improved version of the simplified successive-cancellation decoding algorithm that increases decoding throughput without degrading the error-correction performance. It show that the proposed algorithm has up to three times the throughput of the simplified successive-cancellation decoding algorithm



36

and up to twenty-nine times the throughput of a standard successive cancellation decoder while using the same number of processing elements.

Chuan Zhang and Keshab K. Parhi et, al data-flow graph (DFG) for the SC decoder is derived. Complete hardware architecture is first derived for the conventional tree SC decoder and the feedback part is presented next. Precomputation look-ahead technique is exploited to reduce the achievable minimum decoding latency. Substructure sharing is used to design a merged processing element (PE) for higher hardware utilization. In order to meet throughput requirements for a diverse set of application Scenarios, a systematic approach to construct different overlapped SC polar decoder architectures is also presented.

Dong-Min Shin, Seung-Chan Lim, and Kyeongcheol Yang et, al propose a method to construct lengthcompatible polar codes by employing the reduction of the 2n to 2n polarizing matrix proposed by Arıkan. The conditions under which a reduced matrix becomes a polarizing matrix supporting a polar code of a given length are first analyzed. Based on these conditions, length-compatible polar codes are constructed in a suboptimal way by codewordpuncturing and information-refreezing processes. They have low encoding and decoding complexity since they can be encoded and decoded in a similar way as a polar code of length 2n. Numerical results show that lengthcompatible polar codes designed by the proposed method provide a performance gain of about 1.0 - 5.0 dB over those obtained by random puncturing when successive cancellation decoding is employed.

Zhiliang Huang et, al an improvement is introduced of the modified successive-cancellation (MSC) decoding for polar codes, in which local decoders at rate-other nodes are simplified. This improvement reduces the decoding latency and complexity of the MSC decoding with no sacrifice in error performance. Furthermore, in our method, the decoding latency decreases rapidly with increasing signal-to-noise ratio.

Chuan Zhang and Keshab K. Parhi et, al present the first systematic approach to formally derive the SSC decoding latency for any given polar code. The method to derive the SSC polar decoder architecture for any specific code is also presented. Moreover, the architecture of the precomputation SSC polar decoder is also proposed, which can further reduce the decoding latency.

Yingxian Zhang, Aijun Liu, Xiaofei Pan, Zhan Ye, and Chao Gong et, al a modified belief propagation (BP) polar decoder is proposed. Unlike the original BP polar decoders, a check node is added to each node of the proposed decoder. In the BP decoding, the propagated messages of the nodes are modified by multiplying the messages from the check nodes, so as to enhance the reliability of these propagated messages. Numerical results show that the proposed decoder achieves better performance than that of the original BP decoders, only at cost of some additional multiplications, which indicates its effectiveness.

YouZhe Fan and Chi-ying Tsui et, al an efficient PSN architecture is proposed, based on the properties of polar codes. First, a new partial-sum updating algorithm and the corresponding PSN architecture are introduced which achieve a delay performance independent of the code length. Moreover, the area complexity is also reduced. Second, for a high-performance and area-efficient semi-parallel SCD implementation, a folded PSN architecture is presented to integrate seamlessly with the folded processing element architecture. This is achieved by using a novel folded decoding Schedule. As a result, both the critical path delay and the area (excluding the memory for folding) of the semi-parallel SCD are approximately constant for a large range of code lengths. The proposed designs are implemented in both FPGA and ASIC and compared with the existing designs.

## III. PROPOSED METHOD

## HARDWARE ARCHITECTURES OF 2B-SC DECODER

According to Fig.1, the general 2b-SC decoder has 3 sorts of process nodes: f, g and p nodes.

#### A. Processing Element (PE) for f and g Nodes

As shown in Fig.1, p nodes square measure utilized in stage-m, and f and g nodes square measure utilized in different stages to calculate the propagated LLR values. For simplicity of hardware style, the functions of f and g nodes square measure continuously enforced by unified process components (PEs) [11], [12]. Fig.2 shows the design of this letter of the alphabet supported the LLR version of (16) and (23) with min-sum approximation. Here S2C is that the block that converts from sign-magnitude kind to 2's complement kind, whereas C2S unit will the inverse conversion to boot, adder and sub tractor square measure there to hold out addition and subtraction between the 2 inputs. The corresponding total and distinction square measure hand-picked by the partial total signal  $u_{sum}$  from the PSG block. Finally, at the output finish of the letter of the alphabet, management signal is employed to work out the output as LLR (a) or LLR (b) that is propagated to consecutive stage. In summary, the design shown in Fig2. a pair of has the subsequent blocks one comparator-selector, one adder, one subtracts or, 2 multiplexers, 2 C2S and 2 S2C blocks. Consequently, the essential path delay of letter of the alphabet is  $T_{s2c}+T_{adder}+T_{c2s}+2T_{mux}$ .





Fig.1:The decoding procedure of 2b-SC algorithm with N=8



Fig. 2: The architecture of PE for f and g nodes.

## B. P Node

Since the operate of p node depends on the frozen conditions Of  $u_{2i-1}$  and  $u_{2i}$ , signals frozen1 and frozen2 square measure introduced to point whether or not  $u_{2i-1}$  and  $u_{2i}$  square measure frozen bits or not. If  $u_{2i-1}$  is frozen, frozen1 are going to be one, otherwise zero. Similarly, frozen2 are going to be one or zero once  $u_{2i}$  is frozen or not. Secondly, the sign bits of LLR(c) and LLR (d) square measure utilized for simplifying computations. Denoted as sign LLR(c) and sign LLR (d) these sign bits are going to be, severally, zero or one once the corresponding LLR values square measure non-negative or negative. what is more, the comp signal, that is that the results of comparison between definite quantity of LLR (c) and LLR (d), is additionally utilized. once |LLR(c)| > |LLR (d)| comp are going to be one, otherwise zero. we will get the reality table shown in Table I for  $u_{2i-1}$  and  $u_{2i}$ 

| THE TRUTH | TABLE | OF P | NODE |
|-----------|-------|------|------|
|-----------|-------|------|------|

| Input   |         |              |              | Output     |                   |                 |
|---------|---------|--------------|--------------|------------|-------------------|-----------------|
| Frozen1 | Frozen2 | Sign(LLR(c)) | Sign(LLR(d)) | comp       | U <sub>2i-1</sub> | U <sub>2i</sub> |
| 0       | 0       | 0            | 0            | don't care | 0                 | 0               |



#### [Vengadesh, 1(10): December, 2014]

| 0 | 0 | 1          | 1          | don't care | 0 | 1 |
|---|---|------------|------------|------------|---|---|
| 0 | 0 | 1          | 0          | don't care | 1 | 0 |
| 0 | 0 | 0          | 1          | don't care | 1 | 1 |
| 1 | 1 | don't care | don't care | don't care | 0 | 0 |
| 1 | 0 | 1          | 1          | don't care | 0 | 1 |
| 1 | 0 | 1          | 0          | 1          | 0 | 1 |
| 1 | 0 | 1          | 0          | 0          | 0 | 0 |
| 1 | 0 | 0          | 1          | 0          | 0 | 1 |
| 1 | 0 | 0          | 1          | 1          | 0 | 0 |
| 1 | 0 | 0          | 0          | don't care | 0 | 0 |
| 0 | 1 | 0          | 0          | don't care | 0 | 0 |
| 0 | 1 | 1          | 1          | don't care | 0 | 0 |
| 0 | 1 | 0          | 1          | don't care | 1 | 0 |
| 0 | 1 | 1          | 0          | don't care | 1 | 0 |



Fig.3: The architecture of p node.

Here LLR(c) and LLR (d) square measure painted in sign-magnitude (SM) kind, and that they square measure output from the f and g nodes in stage-(m-1). Additionally, since the frozen conditions of  $(u_{2i-1})$  and  $(u_{2i})$  are preset before the transmission, the management unit provides the signals of frozen1 and frozen2.

#### C. Overall Architecture for 2b-SC Decoder

Based on the circuits of the letter of the alphabet and also the node in Figs.2 and 3, severally, the general 2b-SC decoders are often engineered as a butterfly-like design (Fig.1). However, this fashion of style isn't hardware-efficient. For the design in Fig.1, a minimum of half nodes in every stage square measure continuously idle throughout cryptography procedure. Therefore, to extend hardware Utilization, the subsequent tree-based architectures and line-based architectures [11], [13], square measure sometimes accustomed construct overall SC decoder. Fig.4 shows the design of a tree-based 2b-SC decoder with n=8. During this style, once a selected stage is activated, all the nodes in this stage square measure activated. Therefore, a complete of PEs and one p node square measure required.



(C) Global Journal Of Engineering Science And Researches

#### [Vengadesh, 1(10): December, 2014]

#### ISSN 2348 - 8034

The main disadvantage of tree-based design is that solely the activated stages are able to do 100% hardware utilization in every cycle. Considering the waste of idle resource, the line-based 2b-SC design is employed, it merges stages into one stage, is illustrated in Fig.5. During this figure, the numbers related to the switches indicate the time index once the switches are going to be turned on. Compared with the tree-based design the line-based design is engaging for moderate-speed applications because of its low hardware value and higher hardware utilization potency.



Fig.4. The tree-based 2b-SC architecture with n=8



Fig. 5. The line-based 2b-SC architecture with n=8.



#### **D.** 2b-SC-Precomputation Architecture

In [13], precomputation technique [16]–[18] was exploited to cut back the general latency of the initial SC algorithmic program. The thought of this methodology is to merge the computation of f and g nodes within the same stage because of this the general latency is five hundredth but that of the standard theme what is more, so as to implement the precomputation theme, [13] projected to use incorporated PEs (see Fig. 6). That is completely different from standard 2-input 1-output letter of the alphabet (Fig. 2), this changed 2-input 3-output letter of the alphabet will calculate the precise output off node and a couple of output of g node at constant time. The valid output of the node is chosen and propagated to consecutive stage once corresponding  $u_{sum}$  is accessible. For details of the SC-Precomputation algorithmic program and design, the reader is referred to [13].

Although SC-Precomputation decoder in [13] has saved half the clock cycles, with the assistance of the reformulation (p node) of the last stage, additional reduction on latency are often obtained. Recall that the operate of the p node is to output a pair of bits in one cycle; thus, the incorporated computations for f and g nodes within the last stage of SC-Precomputation theme are often fully replaced by the p node. Additionally, since the essential path of the p node is brief, the computation of p node in adjacent cycles is often incorporated into one cycle.



Fig. 6. The architecture of merged PE for SC-Precomputation decoding in [13].

When merging 2 sequent computations of p nodes into one cycle, a possible drawback is that the increase of essential path delay. As a result of the longest knowledge path between 2 sequent computations of p nodes is longer than that within the incorporated letter of the alphabet in Fig. 6, a simple implementation of the merge operation can increase the essential path delay. To unravel this drawback, look-ahead technique [17], [18] is applied to the last stage. Associate in nursing example of this reformulation is illustrated in Fig. 7. By exploitation look-ahead technique, the essential path of the last stage is reduced from  $2T_{pnode}+T_{PSG}+T_{mux}$  in Fig. 7(a) to  $T_{pnode}+T_{PSG}+T_{4-1mux}$  in Fig.7 (b), that is smaller than the longest path delay



Fig. 7: (a)Original design for two successive computations of pnodes in the last stage (stage-m). (b) Look-ahead reformulation.

#### IV. CONCLUSION

In our approach a completely unique reformulation for the last stage of the SC decoding is planned. supported this reformulation, a reduced-Latency 2b-SC decipherment algorithmic rule is given. additionally, with the employment of precomputation approaches, the decipherment latency of 2b-SC style is additional reduced. Analysis shows that the planned 2b-SC architectures have vital blessings with regard to each output and hardware potency. Future work are going to be directed towards style of polar list decoders exploitation our planned 2-bit decipherment approach

41



#### REFERENCES

- [1] E. Arıkan, "Channel polarization: A method for constructing capacityachieving codes for symmetric binaryinput memorylesschannels," IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, 2009.
- [2] S. B. Korada, E. Sasoglu, and R. Urbanke, "Polar codes: Characterization of exponent, bounds, and constructions," IEEE Trans. Inf. Theory, vol. 56, no. 12, pp. 6253–6264, 2010.
- [3] R.Mori and T. Tanaka, "Performance of polar codes with the construction using density evolution," IEEE Commun. Lett., vol. 13, no. 7, pp.519–521, Jul. 2009.
- [4] I. Tal and A.Vardy, "How to construct polar codes," May 2011, arXiv: 1105.6164v1.
- [5] A. Alamdar-Yazdi and F. R. Kschischang, "A simplified successive cancellation decoder for polar codes," IEEE Commun. Lett., vol. 15, no. 12, pp. 1378–1380, 2011.
- [6] I. Tal and A. Vardy, "List decoding of polar codes," in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2011, pp. 1–5.
- [7] K. Niu and K. Chen, "Stack decoding of polar codes," Elect. Lett., vol. 48, no. 12, pp. 695–696, 2012.
- [8] E. Arıkan, "Systematic polar coding," IEEE Commun. Lett., vol. 15, no. 8, pp. 860-862, 2011.
- [9] E. Arıkan, "Polar codes: A pipelined implementation," in Proc. 4th Int. Symp. on Broad. Commun. ISBC 2010, Jul. 2010, pp. 11–14.
- [10] A. Pamuk, "An FPGA implementation architecture for decoding of polar codes," in Proc. 8th Int. Symp. on Wireless Commun. Syst. (ICWCS), Nov. 2011, pp. 437–441.
- [11]C. Leroux, I. Tal, A. Vardy, andW. J. Gross, "Hardware architectures for successive cancellation decoding of polar codes," in Proc. IEEE ICASSP, May 2011, pp. 1665–1668.
- [12] C. Leroux, A. J. Raymond, G. Sarkis, and W. J. Gross, "A semi-parallel successive-cancellation decoder for polar codes," IEEE Trans. Signal Processing, vol. 61, no. 2, pp. 289–299, Jan. 2013.
- [13] C. Zhang, B. Yuan, and K. K. Parhi, "Reduced-latency SC polar decoder architectures," in Proc. Int. Conf. Commun., June 2012, pp. 3471–3475.
- [14] C. Zhang and K. K. Parhi, "Low-latency sequential and overlapped architectures for successive cancellation polar decoder," IEEE Trans. Signal Processing, vol. 61, no. 10, pp. 2429–2441, May 2013.
- [15] K. K. Parhi and D. G. Messerschmitt, "Static rate-optimal scheduling of iterative data flow programs via optimum unfolding," IEEE Trans. Comput., vol. 40, no. 2, pp. 178–195, Feb. 1991.
- [16] K. K. Parhi, "Pipelining in algorithms with quantizer loops," IEEE Trans. on Circuits and Systems, vol. 38, no. 7, pp. 745–754, Jul. 1991.
- [17] K. K. Parhi, "Design of multi-gigabit multiplexer loop based decision feedback equalizers," IEEE Trans. VLSI Syst., vol. 13, no. 4, pp. 489–493, Apr. 2005.
- [18]K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. New York, NY, USA: Wiley, 1999.
- [19] A. Mishra, A. Raymond, L. Amaru, G. Sarkis, C. Leroux, P. Meinerzhagen, A. Burg, andW. J.Gross, "A successive cancellation decoder ASIC for a 1024-bit polar code in 180 nmCMOS," in Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC), 2012, pp. 205–208.
- [20] Z. Cui and Z. Wang, "A 170 Mbps (8176, 7156) quasi-cyclic LDPC decoderimplementation with FPGA," in Proc. IEEE Int. Symp. Circuits Syst., May 2006, pp. 5095–5098.
- [21]H.-Y. Hsu, A.-Y. Wu, and J.-C. Yeo, "Area-efficient VLSI design of reed-solomon decoder for 10GBased-LX4 optical communication systems," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 53, no. 11, pp. 1245–1249, Nov. 2006.
- [22] D. Oh and K. K. Parhi, "Minsum decoder architecture with reduced word-length for LDPC codes," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 1, pp. 105–115, Jan. 2010.

42

